10 research outputs found

    Incremental Object Database: Building 3D Models from Multiple Partial Observations

    Full text link
    Collecting 3D object datasets involves a large amount of manual work and is time consuming. Getting complete models of objects either requires a 3D scanner that covers all the surfaces of an object or one needs to rotate it to completely observe it. We present a system that incrementally builds a database of objects as a mobile agent traverses a scene. Our approach requires no prior knowledge of the shapes present in the scene. Object-like segments are extracted from a global segmentation map, which is built online using the input of segmented RGB-D images. These segments are stored in a database, matched among each other, and merged with other previously observed instances. This allows us to create and improve object models on the fly and to use these merged models to reconstruct also unobserved parts of the scene. The database contains each (potentially merged) object model only once, together with a set of poses where it was observed. We evaluate our pipeline with one public dataset, and on a newly created Google Tango dataset containing four indoor scenes with some of the objects appearing multiple times, both within and across scenes

    Dense object-level robotic mapping

    No full text
    Autonomous robots operating in unstructured real-world settings cannot rely on an a priori map of their surroundings to support navigation and interaction planning; they must perceive the environment and reconstruct their own internal model of the surrounding space. The more sophisticated the task to automate is, the more expressive the acquired model needs to be. Specifically, robots that are to interact with their environment in meaningful ways require maps that extend beyond the traditional monolithic reconstruction of the observed scene geometry - they require maps that enable reasoning about the individual objects in the scene. This thesis addresses the need for richer and more functional environment models by exploring a novel object-level mapping paradigm. The proposed perception pipeline reconstructs dense environment maps augmented with an understanding of the individual semantically meaningful objects found in the scene. The present research contributes to the topic of dense mapping at the level of objects in unstructured settings in two important ways. The first contribution of this thesis focuses on building volumetric object-centric maps of the environment in an online, incremental fashion during scanning with a localized RGB-D camera. The proposed pipeline processes incoming frames to identify and segment individual object instances therein and fuses the resulting segmentation information into an incrementally built Truncated Signed Distance Field (TSDF) volume that densely reconstructs the observed scene geometry. The segmentation scheme deployed at each frame combines learning-based instance-aware semantic segmentation with a geometry-based convexity analysis of depth images. Such an approach makes it possible to segment semantically recognized objects from a pre-defined set of classes, as well as unknown object-like elements from previously unseen categories which are equally relevant for interaction planning in arbitrary real-world settings. Experimental evaluation within a real-world robotic setup demonstrates the ability of the proposed framework to reconstruct environment models that densely describe the geometry of the scene and contain information about the shape and pose of the individual objects therein. Further, the system achieves state-of-the-art 3D instance-aware semantic segmentation performance on a public real-world indoor dataset, while additionally being able to discover novel objects of unknown class and arbitrary shape. The second part of this thesis extends the proposed object-level mapping paradigm to dynamic scenes to enable simultaneous tracking and reconstruction of multiple rigid objects of arbitrary shape moving in the foreground. The same geometric-semantic per-frame segmentation scheme is deployed at each incoming RGB-D image to identify individual object instances, and the 6 Degrees of Freedom (DoF) pose of each object is tracked in 3D space via point-to-plane Iterative Closest Point (ICP). A core contribution of this work is a novel object-aware volumetric map representation that can store at each voxel more than one implicit object surface. The first benefit of the proposed formulation is the ability to reconstruct the entire scene and all the objects therein within a single volume. Secondly, and more importantly, the novel map representation allows maintaining accurate surface reconstructions throughout occlusions caused by moving nearby objects. Experiments confirm that the proposed framework can successfully track the pose of multiple moving objects while simultaneously reconstructing their shape, and verify that the novel object-aware volumetric map formulation offers robustness to surface occlusions. In all, this thesis validates the central hypothesis that physical objects provide the optimal functional unit for a high-level map of the environment. In the context of dense 3D reconstruction, object-awareness enables reasoning about the shape and pose of individual objects in the scene for autonomously planning high-level interaction tasks. In environments that exhibit dynamics, an object-oriented map representation facilitates tracking and reconstruction of multiple moving objects while addressing challenges such as surface occlusion. The proposed object-level mapping paradigm can both enrich existing methods and give rise to new robotic perception capabilities. Ultimately, the presented results have implications for allowing robots to venture further into the unstructured and ever evolving real world

    TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

    No full text
    The ability to simultaneously track and reconstruct multiple objects moving in the scene is of the utmost importance for robotic tasks such as autonomous navigation and interaction. Virtually all of the previous attempts to map multiple dynamic objects have evolved to store individual objects in separate reconstruction volumes and track the relative pose between them. While simple and intuitive, such formulation does not scale well with respect to the number of objects in the scene and introduces the need for an explicit occlusion handling strategy. In contrast, we propose a map representation that allows maintaining a single volume for the entire scene and all the objects therein. To this end, we introduce a novel multi-object TSDF formulation that can encode multiple object surfaces at any given location in the map. In a multiple dynamic object tracking and reconstruction scenario, our representation allows maintaining accurate reconstruction of surfaces even while they become temporarily occluded by other objects moving in their proximity. We evaluate the proposed TSDF++ formulation on a public synthetic dataset and demonstrate its ability to preserve reconstructions of occluded surfaces when compared to the standard TSDF map representation. Code is available at https://github.com/ethz-asl/tsdf-plusplus

    TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

    No full text
    The ability to simultaneously track and reconstruct multiple objects moving in the scene is of the utmost importance for robotic tasks such as autonomous navigation and interaction. Virtually all of the previous attempts to map multiple dynamic objects have evolved to store individual objects in separate reconstruction volumes and track the relative pose between them. While simple and intuitive, such formulation does not scale well with respect to the number of objects in the scene and introduces the need for an explicit occlusion handling strategy. In contrast, we propose a map representation that allows maintaining a single volume for the entire scene and all the objects therein. To this end, we introduce a novel multi-object TSDF formulation that can encode multiple object surfaces at any given location in the map. In a multiple dynamic object tracking and reconstruction scenario, our representation allows maintaining accurate reconstruction of surfaces even while they become temporarily occluded by other objects moving in their proximity. We evaluate the proposed TSDF++ formulation on a public synthetic dataset and demonstrate its ability to preserve reconstructions of occluded surfaces when compared to the standard TSDF map representation. Code is available at https://github.com/ethz-asl/tsdf-plusplus

    Modelify: An approach to incrementally build 3D object models for map completion

    No full text
    The capabilities of discovering new knowledge and updating the previously acquired one are crucial for deploying autonomous robots in unknown and changing environments. Spatial and objectness concepts are at the basis of several robotic functionalities and are part of the intuitive understanding of the physical world for us humans. In this paper, we propose a method, which we call Modelify, to incrementally map the environment at the level of objects in a consistent manner. We follow an approach where no prior knowledge of the environment is required. The only assumption we make is that objects in the environment are separated by concave boundaries. The approach works on an RGB-D camera stream, where object-like segments are extracted and stored in an incremental database. Segment description and matching are performed by exploiting 2D and 3D information, allowing to build a graph of all segments. Finally, a matching score guides a Markov clustering algorithm to merge segments, thus completing object representations. Our approach allows creating single (merged) instances of repeating objects, objects that were observed from different viewpoints, and objects that were observed in previous mapping sessions. Thanks to our matching and merging strategies this also works with only partially overlapping segments. We perform evaluations on indoor and outdoor datasets recorded with different RGB-D sensors and show the benefit of using a clustering method to form merge candidates and keypoints detected in both 2D and 3D. Our new method shows better results than previous approaches while being significantly faster. A newly recorded dataset and the source code are released with this publication.ISSN:0278-3649ISSN:1741-317
    corecore